perf(orchestrator): add memfile dedup density threshold#2862
perf(orchestrator): add memfile dedup density threshold#2862ValentaTomas wants to merge 13 commits into
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit 146c007. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 7 Tests Failed:
View the full list of 7 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6d2a1dec8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
6d2a1de to
dc32af5
Compare
dc32af5 to
45404d1
Compare
levb
left a comment
There was a problem hiding this comment.
(still reviewing/understanding)
This approach scans in 1 direction and "locally" - the decision on whether to do a parent frame fetch (or store the page locally) is made solely on a single child's relationship to it. If there are parent frames with many children references, they might still be tossed out by the single-child based criteria.
For now, I am asking to move deduplication into its own file, with a more apparent hook in chunk.go for deduplication. This would make it a little easier to compare strategies if a different one appears in the future.
dobrac
left a comment
There was a problem hiding this comment.
The code flow is really complex. I would prefer simplifying (not a blocker though).
Please add more unit tests for the added functionality.
eb72ac0 to
59c2c0d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 59c2c0d958
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
59c2c0d to
e4f45c3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e4f45c3798
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Give the dedup page-kind and fetch-source enums named byte types instead of bare byte, and rename dedupFetchKey fields source/build to sourceType/buildID for clarity. No behavior change.
Return an error instead of panicking when called on a nil *Header, so callers can treat a missing header as 'no mapping' without a separate nil check.
Break the monolithic dedupCompare (cyclomatic complexity 22) into a read-only dedupConfig with per-block compareBlock and per-page classifyPage helpers, plus an in-place dedupAccum accumulator. Extract the fetch-window promotion heuristic onto a fetchWindower receiver, collapsing the duplicated benefit/cost argmax into bestByRatio and the run/key candidate scanning into the parentRuns and parentKeyGroups iterators. parentKeyGroups now yields groups ordered by first page index so promotion selection is deterministic regardless of map order. classifyPage drops its baseHeader nil check now that GetShiftedMapping is nil-safe. No behavior change.
Add unit tests for the extracted dedup helpers: parentRuns segmentation (empties bridge runs, current pages and ends bound them), parentKeyGroups deterministic ordering, fetchWindower count/bestParentRun including the key-group fallback, and compareBlock in-place accumulation.
Verify that when the promotion budget is zero, parent pages matching the base stay deduped even if the block exceeds MaxFetchWindowsPerBlock, rather than being over-promoted into the diff.
Add direct unit tests for recordBlockPages (bitmap routing, absOff page index offset, stored-page count) and fetchWindower.compact (budget guards, no-op when within window budget, parent-run promotion).
Add direct unit tests for classifyPage (empty/parent/current routing, mapping vs offset fetch-window keying, best-effort uncached skip) and the fetchWindower countAfter/bestByRatio budget and benefit gating.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d5eb29694
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Extract the dedup classification, fetch-window promotion, and drain logic out of the generic block cache into dedup.go (same package, no behavior change), leaving Cache.Dedup as the hook so the strategy is contained and easier to swap or compare later.
Adds disabled-by-default controls for memfile dedup fetch fragmentation. When enabled, dedup can promote the cheapest parent-hit pages into the current diff to keep distinct non-empty backing fetch windows under a configured budget, without ever storing empty pages.
Config keys:
maxFetchWindowsPerBlock,maxPromotedParentPagesPerBlock, andfetchRunWindowPages(all0= disabled/default window).